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Abstract 


This  papar  dascribas  work  in  prograss  on  tha  usa  of  visual  scanning 
behavior  as  an  indicator  of  pilot  workload.  Tha  study  is  invastigating  tha 
relationship  batwaan  laval  of  parforaanoa  on  a constant  piloting  task  undar 
simulated  IFR  conditions,  tha  skill  of  tha  pilot,  tha  laval  of  mental 
workload  induced  by  an  additional  verbal  task  imposed  on  tha  basic  control 
taslr.  and  visual  scanning  behavior.. 


^The  results  indicate  an  increase  in  fiaation  dwell  times,  especially 
on  the  primary  instrument  with  increased  mental  loading.  Skilled  subiacts 
’ stared"  lass  undar  increased  loading  than  did  novice  pilots.  Saquanoas  of 
i instrument  fiaatlons  ware  also  examined.  Tha  percentage  occurrence  of  the 
sub-fact's  most  used  sequences  decreased  with  increased  task  difficulty  for 
novice  subiacts  but  not  for  highly  skilled  subiacts. 


A 


Entropy  rata  (bits/sec)  of  tha  saquanoe  of  fixations  was  also  used  to 
quantify  the  scan  pattern.  It  consistently  decreased  for  most  subiacts  as 
the  four  loading  levels  used  increased  An  exponential  equation  in  task 
difficulty  was  found  to  be  a good  predictor  of  entropy  rate.  When  solved 
for  task  difficulty,  the  equation  provided  an  estimate  of  the  level  of  task 
difficulty  perceived  by  A sub-fact. 

Piloting  and  numbet  task  performance  measures  were  reoordad  and  a 
combined  performance  measure  was  computed.  Skill  was  estimated 

independently  via  a method  based  on  pilot  experience.  These  measures  were 
combined  with  entropy  rate  to  develop  a model  relating  performance,  skill, 
and  mental  workload.  The  exponential  model  fit  the  data  well  enough  to 
suggest  •'  that  this  approach  has  promise  in  the  evaluation  of  interactions 
among  these  variables. 

Introduction 

The.  quantification  of  mental  workload  in  aircraft  pilots  has  been  of 
>AAV  considerable  Interest  some  time.  Perhaps  the  ohief  reason  tot  ] 

measuring  workload  is  to  predict  conditions  under  which  task  performance 
will  decrement.  If  such  conditions  could  be  accurately  predicted,  then  the 
nature  and  temporal  sequenoe  of  flight  procedures  and  of  pilot/aircraft 
interfaces  might  be  arranged  so  as  to  minimise  the  ohanoes  of  overload. 
Quantitative  analyses  of  workload  remain  elusive  however.  What  one  would 
like  is  a dear  cause  and  effeot  relationship  between  an  independent 
variation  in  imposed  workload  and  some  reliable  dependent  measure. 
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The  task  of  flying  an  aircraft  is  eomplex  however.  and  it  haa  baan 
difficult  to  clarify  tha  functional  relationship*  between  various 
parameters  in  piloting  tasks.  The  skill  a particular  individual  brings  to 
the  piloting  task  and  the  nature  of  the  task  whioh  is  performed  oan  both  be 
espeoted  to  influence  the  “difficulty1'  of  the  task.  These  faotors  may  be 
further  complicated  by  a shift  in  the  pilot's  priorities:  (Some  tasks  may 

be  ignored  while  others  receive  full  attention). 


PERFORMANCE 


SKILL 


WORKLOAD 


Figure  1.  INTUITIVE  RELATIONSHIPS  BETWEEN 
PERFORMANCE,  SKILL,  & WORKLOAD 


The  problems  whioh  suoh  inter-relationships  introduoe  is  well 
illustrated  when  one  attempts  to  employ  task  performance  as  an  indicator  of 
workload.  All  pilots,  regardless  of  skill,  can  be  expected  to  exhibit  poor 
performance  (if  the  loading  level  is  excessive.  The  overload  situation  is 
relatively  easy  to  assess.  however.  using  sublective  techniques. 
Situations  which  Involve  intermediate  to  high  levels  of  loading  would  seen 
to  be  the  ones  of  more  praotioal  oonoern:  i s.,  one  is  oonoerned  with 

minimising  the  ohanoe  of  a high  workload  approaching  an  overload  situation. 
Intuition  suggests  that  the  level  of  skill  of  the  pilot  may  influenoe  the 
performance  vs  workload  relationship  for  intermediate  or  marginal  loading 
levels.  A pilot  of  high  skill  would  be  expected  to  maintain  "better" 
performance  than  a novioe  flyer  under  any  loading  condition  short  of 
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overload.  This  intuitive  conoept  is  illustrated  graphically  in  figure  1. 

The  research  described  here  uses  this  graphical  representation  of  the 
performanoe/skill/workload  relationships  in  order  to  pose  a number  of 
testable  hypotheses.  It  will  be  suggested  shortly  that  instrument  scan  may 
be  an  indicator  of  workload  and/or  skill  in  certain  types  of  flight 
situations-  a suggestion  supported  by  both  qualitative  and  quantitative 
results.  In  addition,  if  a measure  of  workload  based  on  instrurant  scan  is 
combined  with  independent  measures  of  pilot  skill  and  performance,  then  a 
model  of  the  hypothetical  relationships  in  figure  1 may  be  developed  and 
tested. 

Visual  Scanning  Behavior 

The  pilot  has  many  sources  of  information  input  but  the  most  important 
one  during  instrument  flight  is  probably  the  visual  pathway.  Under 
instrument  flight  conditions,  some  sensory  inputs  may  even  provide  false 
information  suoh  as  vertigo  which  results  from  conflicting  visual  and 
vestibular  information.  The  pilot  obtains  information  concerning  aircraft 
state  by  cross-checking  or  scanning  the  flight  instruments.  The  exact 
method  of  soanning  the  instrument  panel  varies  from  pilot  to  pilot  but 
there  are  some  basic  features  common  to  a "good"  scan  pattern.  Indeed,  it 
was  the  early  study  by  Fitts  and  his  associates  on  instrument  transitions 
which  led  to  the  familiar  "T"  arrangement  of  the  maior  flight  instruments 
(Jones,  etal..  1946). 

A fundamental  notion  n the  present  work  is  that  a repetitive  piloting 
task  will  invoke  a regular  visual  scan  (spatial/temporal  pattern  of  eye 
movements)  during  instrument  flight.  If  this  notion  is  correct,  then  it 
may  be  postulated  that  external  factors  such  as  noise,  interruptions,  and 
fatigue  which  interfere  with  the  piloting  task  may  produce  measurable 
changes  in  the  soanning  behavior.  Suoh  a measure  would  be  particularly 
attractive  for  quantifying  workload  since  it  would  be  both  non-invasive  and 
obleotive. 

Experimental  Design 

A series  of  experiments  is  being  carried  in  order  to  carefully  examine 
these  ideas.  The  basic  experiment  is  described  in  detail  elsewhere  (Tole. 
et  al.  1982)  and  only  the  salient  points  are  repeated  here.  The 
experiments  described  were  performed  at  the  NASA/Langlev  Research  Center. 
Flight  Management  Branoh.  in  Hampton.  Virginia,  making  use  of  their  flight 
simulator  and  ooulometer  facilities  (Middleton,  et.al..  1977). 

Three  factors  were  manipulated  in  the  experiments:  1)  a piloting  task 

requiring  a stereotyped  scan  path.  2)  a verbally  presented  mental  loading 
task,  and  3>  a workload  calibration  side  task. 

V/e  sought  a representative  constant  piloting  maneuver  whioh  might  be 
realistically  expected  to  occur  tor  periods  of  up  to  10  minutes  in  actual 
flight  This  run  length  was  chosen  as  an  estimate  of  the  minimum  amount  of 
time  required  to  provide  a sufficient  number  of  instrument  fixations  to 
satisfy  the  assumption  of  steady  state  conditions.  The  Instrument  Landing 
System  (ILS)  approach  is  often  chosen  as  the  piloting  task  in  studies  of 
workload  (Waller.  1976;  Krebs  and  Wingert.  1976;  Spadv,  1977).  Hovever. 
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the  ILS  approach  represents  a constantly  changing  task  difficulty  as 
touchdown  is  approached  (especially  due  to  increases  in  Clide  slope 
sensitivity  and  cost  of  error  for  course  deviation).  This  variation  in  the 
primary  task  loading  makes  it  difficult  to  accurately  control  the  amount  of 
mental  workload  on  the  pilot  as  an  independent  variable.  It  was  decided 
that  a scenario  in  which  glide  slope  sensitivity  and  heading  were  held 
constant  would  allow  the  piloting  task  difficulty  to  remain  relatively 
constant  for  a long  period,  but  nevertheless  be  more  or  less  realistic. 

A desktop  general  aviation  instrument  flight  simulator  (Analog 
Training  Computers  ATC-S10)  was  used  to  simulate  these  flight  manuevers. 
The  ATC-510  is  a procedures  trainer  for  light,  single  engine,  fixed  pitch 
prop,  fixed  gear.  IFR  equipped  aircraft  The  simulator  was  equipped  with  a 
turbulence  level  control  which  was  set  to  the  first  level  above  calm 
conditions  in  order  to  force  some  pilot  vigilance  on  the  flight  task. 

Pilot  lookooint  on  seven  instruments  (Attitude  Indicator  'ATT1. 
Directional  Gyro  'DG‘.  Altimeter  'ALT'.  Vertical  Speed  Indicator  'VSI'. 
Airspeed  'AS'.  Turn  and  Bank  '*B‘.  and  Glide  SIope/Localiser  'GSL')  was 
measured  using  a Honeywell  oculometer  system  which  has  been  substantially 
modified  by  NASA  Langley  Research  Center  (Middleton,  et.al..  1977).  This 
devioe  is  non-invasive  and  allows  the  user  to  determine  the  time  course  of 
eye  fixations  on  instruments  employed  by  the  pilot  and  the  dwell  time  of 
each  fixation  to  the  nearest  1/30  sec. 

The  mental  loading  task  was  chosen  so  as  not  to  directly  interfere 
with  the  visual  scanning  of  the  pilot  (i.e.  the  task  would  not  require  the 
pilot  to  look  away  from  the  instruments)  while  providing  constant  loading 
during  the  maneuver.  The  task  used  required  the  pilots  to  respond  to  a 
series  of  evenly  spaced  three-number  sequences  (Wittenborn.  1943)  presented 
to  them  audibly  by  means  of  a speaker.  The  pilot  was  told  that  he  must 
respond  to  each  three-number  sequence  by  indicating  either  "plus"  or 
"minus"  according  to  the  algorithm  . first  number  largest,  second  number 
smallest  = "plus"  (e  g.  5-2-4).  last  number  largest,  first  number  smallest 
■ "plus"  (e.g.  1-2-3).  otherwise,  "minus"  (e.g.  9-5-1). 

The  mental  workload  experienced  by  the  pilot  is  inversely  proportional 
to  the  intervals  between  number  sequences.  This  relationship  is  given  by 
the  following  equation  which  is  arbitrarily  chosen: 

(1)  TD  a 1/interval  betweenj^task 

where  TD  is  equal  to  imposed  task  difficulty.  The  four  loading  levels  used 
in  the  current  experiments  were  intervals  of  continuous  silence  (i.e. 
no-numbers  presented),  ten.  five,  and  two  seoonds  which  have  corresponding 
task  difficulties  of  0.0.  0.1.  0.2.  and  0.5.  respectively. 

Numbers  were  generated  by  a computer  controlled  speech  synthesiser. 
This  allowed  automated  scoring  of  task  accuracy,  calculation  of  response 
reaotion  times,  and  the  possibility  of  temporal  correlations  of  visual  or 
other  responses  with  the  verbal  stimulus.  The  probabilities  of  occurence 
of  " + " and  sequences  were  each  0.5.  The  pilot  was  instructed  to  give 

the  number  task  priority  equal  to  that  of  the  piloting  task  as  if  the 
verbal  questions  represented  a constant  rate  of  radio  communication. 
Performance  was  recorded  by  having  the  pilot  press  a 3-position  rocker 


switch  mounted  on  the  voks  up  (or  plus  and  down  for  minus. 


The  amount  o(  mental  loading  imposed  on  the  pilot  by  the  number  task 
was  calibrated  using  a side  task  (Ephrath.  1975).  Th4  runs  made  with  the 
side  task  were  not  used  in  the  scanning  analysis,  however,  due  to  the 
alteration  o(  normal  scanning  caused  by  the  task.  The  results  <Tole. 
et.al..  1982)  (rom  these  runs  confirmed  the  relative  difficulty  of  the 
various  number  intervals. 

A microprocessor  development  system  (Burns,  et.al.  1980)  was  used  for 
both  stimulus  presentation  and  data  collection  and  analyses. 

Performance  Measures 

Several  variables  were  obtained  from  each  of  the  twotasks  in  order  to 
allow  the  computation  of  performance  scores.  The  scores  developed  ran 
between  0 percent  and  100  percent  with  100  percent  being  obtained  if  the 
pilot  never  deviated  from  the  intended  path  in  space  on  the  piloting  task, 
and  if  all  number  task  sequences  were  answered  correctly  for  the  mental 
loading  number  task.  The  scores  from  the  piloting  and  the  mental  loading 
tasks  were  then  combined  to  provide  a performance  measure  to  be  used  in  the 
validation  of  proposed  performance/skill/workload  model. 

The  scoring  measure  for  the  number  task  was  computed  l.s  given  below. 

( TOT  - WRO  - MIS) 

(2)  ^#TP  « x 100% 

TOT 

where 

TP  m mental  loading  number  task  performance 
TOT  * total  number  of  stimuli  presented 
WRO  « number  of  inoorrect 'responses 
MIS  a number  of  missed  responses 

This  score  was  100  peroent  if  the  pilot  answered  every  sequence  correctly 
and  zero  percent  if  * pilot  either  answred  incorrectly  or  missed  all  of 
the  stimuli  presented.  Most  sublects  score  nearly  100%  on  this  task  if 
they  have  nothing  else  to  do  simultaneously. 

The  raw  data  available  for  scoring  performance  on  the  piloting  task 
were  the  errors  from  the  intended  track  for  the  glide  slope  and  localizer 
courses.  Discussions  with  several  highly  skilled  pilots  revealed  that 
accuracy  of  tracking  the  glide  slope  and  localizer  might  not  provide  a 
complete  performance  picture.  These  pilots  were  willing  to  trade  off 
"smoothness"  when  the  loading  task  became  more  difficult;  i.e.  the  pilot 
may  perform  the  piloting  task  to  the  same  level  of  accuracy,  as  far  as 
deviations  from  a'  designated  path  are  concerned,  on  two  different  runs  but 
produce  two  very  different  ride  qualities  for  these  runs.  One  possible 
measure  for  smoothness  could  be  the  frequency  of  oscillation  around  the 
intended  path  The  higher  this  frequency  is.  the  less  "smooth"  the  ride 
becomes.  It  was  arbitrarily  assumed  that  a smooth  ride  would  contain 
frequeoies  mostly  less  than  0.1  Hz.  Under  this  assumption,  measurement  of 
the  spectral  component  of  the  aircraft  dynamios  above  0.1  Hz.  would 
indicate  any  decrement  in  the  ride  quality. 
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In  order  to  examine  this  measure.  the  power-spectral  density  (PSD)  of 
tho  course  deviations  was  computed  The  bandwidth  of  the  calculated  PSD 
was  2.5  Hs.  The  "power"  within  a band  of  frequencies  may  be  determined  by 
integrating  the  PSD  over  that  band  (Sohwarts.  USD.  We  ohose  to  consider 
the  % of  the  speotral  power  which  was  located  in  the  band  from  0.1  to  2.5 
Kt.  This  was  calculated  by  subtracting  the  power  contained  in  the  band 
from  0 to  0.1  Hs  (assuming  that  the  D.C.  component  was  first  removed)  from 
the  total  power  in  the  spectrum  and  multiplying  by  100%.  This  % of  the  PSD 
was  computed  for  both  the  glide  slope  and  the  localiter  and  combined  wth 
the  two  RMS  measures  to  provide  four  candidate  variables  to  be  included  in 
a performance  score  for  the  piloting  task. 

Since  the  pilots  were  instructed  to  give  equal  priority  to  the 
piloting  task  and  the  mental  loading  number  task,  both  were  included  in  the 
development  of  a combined  performance  soore.  While  a weighting  of  0.5 
might  have  been  assigned  to  each  task,  it  was  decided  to  leave  the 
weighting  free  to  allow  the  model  fitting  procedure  to  determine  the 
relative  weights.  A linear  relationship  between  all  of  the  terms  was 
assumed  and  the  form  of  the  equation  beoane: 

(3)  P - CONST  + aijfc’P)  ♦ b(PMS/CS)  «■  c(RMS/LOC> 

♦ d(%PVR/CS)  + e(%PWR/LOC) 

where 

P ■ combined  performance  measure 
CONST  a oonstant  term 
TP  b mental  loading  number  task  performance 
RMS/GS  b RMS  error  from  glide  slope  traok 
RMS/LOC  a RMS  error  from  looaliier  track 

%PWR/GS  a percent  of  power  from  the  power -spectral  density  for 
the  glide  slope  greater  tan  0.1  Harts 
%PWR/LOC  a percent  of  power  from  the  power-speotral  density  for 
the  localiser  greater  than  0.1  Herts 

Estimation  of  Pilot  Skill  levels 

In  order  to  assess  the  affeots  of  skill  on  performance  and  mental 
workload,  an  independent  quantitative  measure  of  skill  was  needed.  A model 
of  pilot  skill  based  on  asperience  factors  was  used  for  this  purpose 
(Hollister,  at  al.  1973).  This  model  was  developed  in  order  to  predict  the 
ourrent  level  of  skill  of  pilots  flying  light,  single  engine  aircraft. 

(4)  Skill  b 1.42  + 0.25(reoency)  * 0.73(log(total  time)) 

- 0 030(vears  certified)  + 0.15(log(time  in  type)) 

- O.OOSS(age)  + e 

where 

Skill  b score  reflecting  relative  piloting 
performance 

reoenoy  • number  of  flight  hours  in  past  30  days 
total  time  * total  number  of  flight  hours 

time  in  type  b total  number  of  hours  in  light  single  engine  aircraft 
years  certified  a time  in  years  since  last  certificate 
orating 

age  a subleots's  age  in  years 

e a residual  varianoe  not  explained  by  the  model 
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A raw  skill  scors  was  calculated  (or  sach  of  ths  pilot  sublects  using 
ths  modal.  Ths  pilot  with  ths  highest  resulting  skill  score  was  then  used' 
to  normalise  all  of  the  seores  so  that  skill  levels  would  range  between  0%  ' 
and  100%.  Eleven  sublects  ranging  in  skill  from  NASA  test  pilots  to 
non-pilots  participated  in  the  experiments.  The  relative  skill  soores  for 
the  sublects  are  given  in  Table  I. 

NASA  PILOT*  SKILL  SCORE 


3 

100% 

■ • > *' 

4 

85 

11 

77 

13 

53 

IS 

39 

6 

37 

12 

33 

'■ 

14 

32 

1 

22 

„ -. , .. 

7 

15 

16 

13 

TABLE  I. 

Relative  Skill  Scores  of  Subiects  based  on  Eo.uation  4 


Though  care  must  be  taken  when  applying  an  eo.uation  such  as  this  in  a 
different  set  of  experimental  conditions,  the  overall  rank  ordering  of  the 
pilots  by  this  method  is  probably  accurate  as  it  generally  agreed  with 
sublective  rating  of  the  pilot's  skills  by  experienced  observers  at  the 
NASA/Langlev  Researoh  Center. 

Conduct  of  the  Experiments 

Each  session  consisted  of  four  10-minute  runs  with  a 5-minute  break 
between  eaoh  run.  The  difficulty  of  the  mental  loading  task  would  start  at 
no  numbers  for  the  first  run  and  inorease  to  2-seo  intervals  by  the  fourth 
run.  Some  sublects  participated  in  two  sessions,  one  without  and  one  with 
the  side  task.  Eaoh  sublect  was  allowed  to  practice  all  three  tasks  until 
he  felt  oomfortable  with  them. 

<i  * '.y 


Preliminary  Results 


Instrument  dwell  time  histograms  and  the  frequency  of  usage  of 
different  sequences  of  instrument  fixations  were  both  affected  by  the 
loading  task.  Both  results  are  reported  in  detail  elsewhere  (Tola.  et.al.,  ' 
1982)  and  « only  the  malor  points  are  mentioned  here  An  increase  in  dwell 
time  with  inorease  in  mental  loading  was  observed  ih  all  sublects.  This  i* 
illustrated  in  figure  2.  Novice  sublects  generally  had  much  longet  dwell 
times  under  increased  load  than  did  skilled  pilots.  (Relative  skill  levels 
are  given  in  Table  I above.)  The  fixation  sequences  of  the  pilot's 
instrument  sans  were  analyzed,  and  the  percentage  occurrence  of  the*  ten 
most  frequently  oocurring  sequences  were  also  analysed.  These  results 
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Figure  2.  DWELL  TIME  HISTOGRAMS  FOR  TWO  SKILLED  PILOTS  (#4  & #11) 
AND  TOO  NOVICE  PILOTS  ( if 9 & #10)  UNDER  VARIOUS.  LOADING 
CONDITIONS 

indicat*  that:  1)  skilled  pilots  us*  a higher  p*ro*ntag*  oE  th*ir  tan  most 

frequently  occurring  sequences  than  do  novice  pilots  and  2)  the  soan 
pattern  of  the  novioe  subiects  were  affeoted  more  by  the  increase  in  mental 
loading  than  were  the  patterns  of  the  highly  skilled  pilots.  This  result 
is  shown  in  figure  3. 

A more  general  method  of  quantifying  the  scan 

Traditionally,  much  of  the  quantitative  analysis  of  scanning  patterns 
has  employed  Markov  transition  probability  matrioes  (Stark  and  Ellis.  1981; 
Krebs  and  Wingert.  1978).  Suoh  matrioes  do  describe  the  predominant 
patterns  in  the  soan  via  the  relative  sites  of  transition  probabilities  but 
it  is  either  extremely  unwieldy  or  impossible  to  oompar*  two  of  these 
matrices  for  different  experimental  conditions.  One  of  the  maior  goals  of 
this  research  is  the  identification  of  general  methods  for  the  ptudy  of 
soanning  behavior.  To  be  most  useful  the  method  should  be  independent  of 
the  number  and  arrangment  of  instruments.  The  nature  of 
•y*-point-of-regard  data  (sequential  instrument  and  dwell  times)  obtained 
from  the  oculometer  suggests  several  methods  from  information  theory  which 
may  have  this  generality. 
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LOADING  INTERVALS  INTERVALS 
LOADING  TASK 

Figure  3.  PERCENT  USAGE  OF  LENGTH  4 SEQUENCES  UNDER  VARYING 
LOAD  (TYPICAL  SEQ  : ATT  - DG  - ATT  - ALT) 


The  piloting  task  in  the  current  experiment  ie  euch  that  the  pllot'e 
soan  can  only  lie  on  one  of  the  7 specified  instruments  although  eaoh 
fixation  may  be  of  arbitrary  duration.  The  time  history  of  fixations  has  a 
form  which  is  similar  to  that  of  a communications  system  which  oan  assume  7 
discrete  states  with  a varying  duration  in  eaoh  state.  The  orderliness  of 
such  a system  is  related  to  the  probabilities  with  which  it  ocoupys  its 
different  states.  A system  which  always  oooupied  the  same  state  or  always 
made  the  same  transitions  between  states  would  thus  be  quite  orderly.  In 
the  oase  of  instrument  soan.  these  situations  would  be  paralleled  by 
staring  and  by  a stereotyped  soanpath  respectively. 

This  concept  of  system  order  may  be  stated  oompaotlv  using  the 
mathematical  form  for  entropy  from  information  theory.  The  entropy  of  a 
sequence  is  defined  as  <Shannon  and  Weaver.  194»: 
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(5) 


D 

H ■ -J5?  p log  p 
o i*i  i 2 i 

where 

H ■ observed  average  antropy 

0 

p * probability  of  sequence  i occurring 

1 

D m g$fco f Different  aaquancaa  in  tha  acan 


In  tha  oast  of  tha  instrument  scan,  antropy  has  tha  units  of 
bits/aaquanoa  and  providaa  a measure  of  tha  randomnaaa  Cor  ordarlinaaa)  of 
tha  aoanpath.  Tha  highar  tha  antropy.  tha  mora  diaordar  ia  praaant  in  tha 
scan.  Tha  maximum  poaaibla  antropy  ia  conatrainad  by  tha  axparimantal 
oonditiona  (aaa  balow).  Tha  antropy  maaaura  uaaa  tha  aama  probabilitias 
whioh  ara  praaant  in  tranaition  matricas.  but  it  vialda  a aingla.  mora 
oompaot  axpraaaion  for  tha  ovarall  bahavior  of  tha  probabilitiaa  rathar 
than  praaanting  tham  aach  individually.  Thia  mathod  appaara  to  afford  aome 
ganarality  and  has  baan  tha  focua  of  our  racant  afforta. 

To  implamant  thia  mathod.  aach  of  tha  instruments  to  ba  axaminad  was 
givan  a numbar.  Than  a saquanca  of  thasa  numbers  was  stored  as  tha  pilot 
soannad  tha  instrumant  panal  togathar  with  tha  dwell  time  for  aach 
fixation.  While  aaquanoas  of  up  to  length  4 ware  considered  in  praliminarv 
analyses,  tha  moat  detailed  study  was  made  on  aaquancaa  of  length  2.  Tha 

remainder  of  tha  discussion  hare  applies  to  tha  results  for  length  Z 
aaquanoas.  Details  of  themetfiodolgy  ara  given  alaawhara  (Stephans.  1981). 

It  oan  ba  shown  that  tha  observed  antropy  for  tha  instrument  soan  is 
related  to  tha  total  numbar  of  fixation  aaquancaa  <L.  defined  with  aquation 
7 balow)  observed  during  a run.  In  order  to  oompara  entropies  from  tha 
scans  of  different  pitots  for  different  run  lengths,  each  estimate  of 
antropy  had  to  ba  oorractad  for  L and  normalised  to  its  maximum  possible 
value.  Hmax.  Hmax  may  ba  calculated  as  follows.  In  tha  most  general  ease. 

M instruments  may  ba  arranged  in  soma  arbitrary  fashion  on  tha  cockpit 
panal.  For  a givan  numbar  of  instruments'.  M.  and  saquanca  length  N.  tha 
maximum  numbar  of  different  fixation  saquanoas  is  givan  by: 


N-l 

(f)  0 ■ MCM-l)  ■ maximum  numbar  of  sequences  of  length  N 

Tha  numbar  of  bits  required  to  uniquely  anoode  all  Q possible  sequences  is 
logZ  Q.  Tha  magnitude  of  this  latter  numbar  also  raprasents  Hmax  of  tha 
visual  soan  for  tha  numbar  of  instruments  an  saquanca  length  being 
oonsidarad.  For  example,  with  7 instruments  the  value  of  Q for  sequences 
of  Z instruments  is  36  which  yields  a oorresponding  Hmax  ■ 5 8. 

Tha  normalised  value  of  H may  than  ba  calculated  from: 
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<7)  Hcorr  ■ Ho  * 


Hint  i 


whtri 

L 

R 

N 


Log  L 
2 

R-N  + l ■ nunbir  of  sequences  in  a run 
number  of  fixations  in  m run 
sequence  length  (N  ■ 1.2.3.  or  4) 


While  entropy  should  help  to  explain  the  orderliness  (or  lack  thereof) 
of  the  scanning  pattern,  the  development  presented  up  to  this  point  does 
not  include  the  fact  that  the  dwell  time  for  each  fixation  is  different. 
From  the  preliminary  results  on  instrument  dwells,  it  appears  rather  clear 
that  dwell  times  can  be  markedly  affeoted  during  high  mental  loading.  In 
order  to  include  the  effect  of  time  in  our  measure,  a term  for  entropy  t*-+e. 
was  defined  as: 

<8>  Hrate  - Ho/t 

where  Ho  is  the  entropy  for  the  system  given  by  7 and  t » smallest  interval 
in  which  a transition  may  occur. 

In  praotice.  the  calculation  of  Hrate  was  an  average  value  given  by 
the  following: 


0 

(9)  Hrate  Hoorr  /DT 

avg  iml  1 i 

where 

Koorr  » Normalised  entropy  for  ith  sequenoe 
i 

DT  ■ Average  Dwell  time  for  ith  sequence 
i 

D mjfzot  different  fixation  sequences 


It  is  helpful  to  estimate  the  maximum  value  which  Hrate  might  assume. 
This  may  be  oaloulated  using  the  maximum  for  entropy  determined  above 
together  with  dwell  time  statistics  for  the  various  instrument  sequences  in 
the  soan.  While  it  is  possible  for  pilots  to  make  rather  rapid  glances 
(with  dwell  times  of  100  msec  or  less)  at  their  instruments  (Harris  and 
Christhtlf.  1980)  a fixation  rate  this  high  (10  fixations/sec)  rapidly 
leads  to  oculomotor  fatigue.  A morerealistic  average  value  is  probably 
about  2 fixations/sec  or  less  for  a long  period  of  instrument  scan  (say  > 

10  seo). 

Using  0.5  seo/look  (2  fixations/seo)  as  the  average  dwell  interval, 
the  maximum  entropy  rate  for  sequences  of  length  2 is  calculated  to  be 

Hrate  ■ 5. 8/0. 5 * 2 fixations/seq.  ■ 6 bits/sec 
max 

This  number  represents  an  upper  bound.  Since  we  suspect  that  the  pilot 
must  have  some  regularity  in  his  or  her  scan,  the  numbers  we  would*  expect 
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to  obtain  under  actual  flight  conditions  will  probably  be  lower  The 
observed  average  Hrata  for  the  current  experiments  was  on  the  order  of  1 
bit/sec.  A tendency  to  stare  under  increased  load  should  be  reflected  by 
decreased  entropy  and  increased  fixation  times  making  Hrate  tend  toward 
lower  values  under  such  conditions.  Figure  4 plots  Hrate  vs  number  Task 
Difficulty  for  all  pilots  except  12  and  8. 


Figure  4.  ENTROPY  RATE  ON  LENGTH  2 SEQUENCES  vs. 
IMPOSED  TASK  DIFFICULTY 


A trend  toward  lower  entropy  rate  with  higher  task  difficulty  may  ba  seen. 

A two-way  analysis  of  variance  was  performed  for  the  entropy  rate  data  from 
nine  pilots  on  levels  of  task  difficulty  and  between  sublects.  F tests 
allowed  relection  of  two  null  hypotheses:  equality  of  mean  Hrate  at  all 

loading  levels  <p  < 0.01)  and  equality  of  mean  Hrate  between  sublects  (p  < 
0.01).  All  six  combinations  of  level  differences  in  mean  Hrate  were  found 
to  be  statistically  significant  (T-test  p < C OS).  Thus  Hrate  was  chosen 
to  map  from  soanning  behavior  into  task  difficulty  (l.e.  workload). 

The  model  used  expresses  Hrate  as  an  exponential  funotion  of  TD. 

(10)  Hrate  - 0 9279  EXP(-TD) 


This  equation  was  obtained  via  a regression  analysis  based  on  the  data  from 


seven  of  the  pilot*  with  * coefficient  of  determination.  R-squared.  ■ 
97.3%  This  equation  may  bo  solved  for  task  difficulty  withth*  following 
rasults: 

(11)  TD  - -<0.06  ♦ In  Hrato). 

This  expression  can  then  be  used  to  prediot  the  level  of  TD  for  a new 
subiect  under  the  conditions  of  the  experiment  reported  here. 

Model  Development  and  Verification 

One  of  the  maior  goals  of  this  work  was  the  development  of  a model 
relating  performance,  skill,  and  mental  workload.  The  ultimate  goal  is  the 
prediction  of  performance  given  estimates  for  skill  and  scanning 
parameters.  A model  relating  performance,  skill,  and  mental  workload  may 
be  postulated  from  the  empirical  relationship  shown  in  figure  1. 
Construction  of  the  model  should,  in  fact,  aid  in  determine  whether  suoh 
empirical  expressions  are  valid.  The  model  chosen  was  an  exponential  form: 


2 

(12)  P - P(0>  - EXP((TD/Skill>  ) 

This  equation  may  be  rearranged  as  follows: 

Z 

(13)  EXPUTD/Skill)  ) - P(0)  - P 

whioh  states  that  the  exponential  term  is  equal  to  the  difference  in  te 
performance  at  the  no-loading  level  P(0>  and  the  performance  at  the  present 
level  of  mental  loading  P.  Using  the  values  for  the  level  of  skill  and 
task  difficulty  calculated  in  equations  4 and  11  respectively,  the  left 
hand  side  of  the  equation  may  be  computed.  The  right  hand  side  of  the 
equation  must  be  expressed  in  terms  of  measurable  performance  indicators. 

Expanding  the  right  side  of  (13)  yields 

(14)  P<0)  - P - aflfTP(O)  -ytTP)  + b(RMS/GS(0)  - RMS/C8) 

♦ o(PMS/LOC(0)  - RMS/LOC)  * d(%PVR/CS(0>  - %PVR/GS> 

+ *(%PWR/LOC<0)  - %PVR/LOC> 


A multiple  regression  analysis  was  then  performed  on  equation  13  using 
values  for  eaoh  of  these  measures  reoordad  during  the  experiments. 

The  data  from  seven  pilots  was  used  for  model  development,  while  that 
from  three  other  subjects  was  used  for  model  verification.  One  pilot's 
performance  data  was  disoarded  due  to  equipment  malfunction. 

The  results  of  the  first  attempt  at  regressin  indioated  that  the 
coefficient  of  the  %PWR/LOC  term  oould  not  be  differentiated  from  sero 
based  on  a Student's  T-test.  This  variable  was  eliminated  from  equation  13 
and  the  analysis  was  repeated.  This  regression  yielded  non-xero  values  for 
the  coefficients  a through  d.  and  included  a constant  term.  The  resulting 
equation  was 
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2 

(15)  EXP«TD/Skill>  ) - 1.4483  + 0 0351^TP<0>  -# TP) 

+ 0. 1765<RMS/GS(0)  - RM5/GS)  - 0.0366(RMS/LOC<0>  - RMS/LOC) 
♦ 0.0377<%PWR/GS(0)  - %PWR/GS> 

This  analysis  had  an  R squared  value  of  76.6  percent  and  an  P-ratio  of 
12.28  (p  < 0 01).  The  coefficients  determined  for  15  may  now  be  used  in 
equation  3 which  becomes 

(16)  P - 1.4483  ♦ 0.0351^TP)  + 0.1765(RKS/GS) 

- 0 0366<RMS/LOC)  + 0.0377(%PWR/GS). 

These  coefficients  provide  the  relative  weightings  for  each  of  the 
performance  terms  but  they  need  to  be  scaled  in  order  to  provide  the  proper 
characteristics  for  the  equation.  If  each  of  the  terms  were  at  their 
maximum  value,  that  is  100  percent,  then  the  combined  performance  measure 
should  also  equal  100  percent.  However,  using  the  coefficient  this  100 
percent,  each  coefficient  must  be  multiplied  by  100./22.72  « 4 40  The 
modified  performance  equation  becomes: 

(17)  P - 6.3750  + 0.1545^TP)  + 0.7769(RMS/GS>  - 0.161KRMS/LOC) 

♦ 0.1658<*PVR/GS) 

A plot  of  this  fuction  versus  the  task  difficulty,  obtained  from  equation 
11.  is  provided  in  Figure  5. 

It  was  hoped  that  these  curves  would  resemble  those  given  in  the 
hypothetical  plot  in  Figure  1 and  for  some  of  the  pilots,  a general  overall 
downward  trend  is  present.  Even  though  the  curves  do  not  matoh  the 
hypothetical  ones  exactly,  there  are  some  common  features  between  them. 
First  of  all.  the  curve  for  the  lowest  skilled  pilot  7 is  seen  to  deorease 
much  more  rapidly  than  the  curves  forthe  more  highly  skilled  pilots  ( 3. 

11:  the  two  points  for  3 are  for  the  third  and  highest  levels  of  mental 

loading  respectively). 

To  test  this  model's  value  as  a predictive  tool,  the  data  from  three 
subiects  not  included  in  the  model  determination,  were  substituted  into 
equation  17  and  plotted  versus  perceived  task  difficulty  in  Figure  6. 

Pilots  12.  8.  and  16  produce  some  interesting,  if  not  consistent 
results.  The  three  points  of  pilot  12.  and  pilot  16  are  for  the  second, 
third,  and  highest  loading  levels.  All  three  pilots  show  a net  decrease  in 
performance  between  their  lowest  and  highest  task  difficulties  even  though 
they  accomplished  this  decrease  in  very  different  ways.  Pilot  8 appears 
to  be  the  closest  to  the'  theoretical  model  with  his  sharp  decrease  in 
performance  over  a very  small  task  difficulty  increase.  Pilot  16.  on  the 
other  hand,  appears  to  be  decreasing  at  an  exponentially  decreasing  rate  as 
opposed  to  the  model  which  predicts  reasing  performance  at  an 
exponentially  increasing  rate.  Pilot  12  increases  performance  sharply 
between  his  second  and  third  runs  and  then  decreases  iust  as  sharply 
between  the  third  and  fourth  runs. 

Since  the  choice  of  the  exponential  model  for 
performance/skill/workload  was  arbitrary,  two  other  forms  for  the  model 
were  also  examined.  These  were  circular  and  linear  models  and  neither  was 
as  good  at  fitting  the  data  as  the  exponential  and  hence  were  abandoned. 
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PERFORMANCE! 


-U>  .05  .20  .35  .50  .65 

TASK  DIFFICULTY  (FROM  EQ  10) 

Figure  5.  Combined  performance  (from  model)  perceived  task 
difficulty  for  7 pilots  used  in  model  develonment 
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-V)  .05  ,20  .35  .50  .65 

TASK  DIFFICULTY  (FROM  EQ  10) 

Figure  6.  Combined  performance  vs.  task  difficulty  for  3 test 
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The  models  described  here  ere  still  under  development  end  work  is  in 
progress  to  repeet  the  experiments  described  here  end  to  epply  this 
methodology  to  other  instrument  flight  scenarios. 

Summary 

This  paper  presents  some  of  the  findings  frm  a set  of  experiments 
designed  to  explore  the  relationship  between  performance,  skill,  and  visual 
soanning  behavior  of  aircraft  pilots  under  varying  levels  of  mental 
workload.  Instrument  fixations  were  reoorded  as  a group  of  pilots  with 
widely  varying  levels  of  skill  simultaneously  performed  a constant 
instrument  flight  task  and  a verbally  presented  loading  task  with  4 
disorete  levels.  Initial  results  indicate  a tendency  of  lesser  skilled 
pilots  to  stare  at  the  primary  instrument  as  loading  is  increased  and  to 
alter  the  frequency  of  usage  of  different  scan  paths.  Skilled  pilots 
demonstrated  much  less  change  on  both  of  these  measures. 

A roalor  finding  of  the  research  suggests  that  under  relatively 
oonstant  instrument  flight  conditions  the  entropy  rate  of  the  visual  scan 
path  may  be  a useful  measure  of  the  level  of  mental  workload  induoed  by  a 
constant  rate  verbal  task.  This  measura  of  workload  was  oombined  with 
independent  estimates  of  performance  on  the  piloting  and  verbal  tasks  and 
of  pilot  skill.  An  exponential  model  relating  these  factors  was  developed 
and  has  undergone  preliminary  tests.  The  model  helps  provide  insight  on 
the  intimate  connections  between  a particular  workload  measure  and  operator 
skill  and  performance  strategy. 
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Question  a. 

Considerable  research  along  similar  lines  was  done  under  ONR/NASA 
sponsorship  several  years  ago  at  STI  (e.g.  see  NASA  CR  1569  and  ONR 
reports  "STI  TR  163-1  and  183-  ).  Invokes  control  theory  analysis 

to  show  that  scan  patterns  are  not  completely  random,  but  have  pre- 
dictable (explainable)  correlations  with  controlled  element  and  task 
demands.  (Data  show  similar  effects  as  yours). 

Answer  a. 

There  certainly  are  some  interesting  parallels  between  our  work  and 
earlier  studies  at  STI  as  in  NASA  CR  1569.  Both  efforts  reveal  an 
observable  change  in  scanning  behavior  with  varying  task  difficulty. 
The  experimental  conditions  are  somewhat  different,  however,  in  that 
the  STI  work  uses  a "critical  side  tracking  track"  which  requires  an 
alteration  in  the  scan.  Our  method  (verbal  task  if  varying  difficulty) 
does  not  in  itself  require  an  altered  scan  path  for  its  successful 
performance.  As  the  critical  task  difficulty  increases  the  swell 
times  become  shorter.  For  increased  verbal  loading  in  our  experiments, 
the  dwell  times  become  longer. 

While  these  two  findings  are  not  directly  comparable,  they  do  point 
out  the  potential  utility  of  Instrument  scan  in  the  measurement  of 
behavior  of  pilots  and  the  need  for  great  care  in  the  interpretation 
of  scanning  data  within  the  context  of  a particular  experiment. 


Question  b. 

(a  "nit")  Why  use  the  arcane  term  "entropy"  and  "* — " rate  when  the 
current  term  (circa  1960’s  and  on)  is  "transinformation  index"  and 
rate  (e.g.  used  by  Ames  references  since  1960's)? 

Answer  b. 

The  use  of  the  word  "information"  would  be  misleading  in  the  context 
of  our  experiments  since  we  do  not  currently  attempt  to  quantify  the 
amount  of  Information  the  pilot  is  obtaining  from  his  displays. 
Rather,  we  are  concerned  for  the  moment  only  with  the  orderliness  of 
the  scan  pattern.  The  method  used  to  quantify  the  order  in  the  scan 
was  the  mathematical  form  of  entropy  as  presented  in  the  original 
works  on  information  theory  (Shannon  & Weaver,  1949) . Entropy  seems 
a clear  enough  term;  "transinformation"  on  the  other  hand  suggests  a 
broader  meaning  than  we  intend  in  our  work  reported  here. 
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